Answer the following questions:
In this activity, you will:
O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapter 7. John Wiley & Sons: New Jersey.
For this activity you will need the following:
An R markdown notebook version of this document (the source file).
A package called geog4ga3.
It is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:
rm(list = ls())
Note that ls() lists all objects currently on the worspace.
Load the libraries you will use in this activity. In addition to tidyverse, you will need sf and geog4ga3:
library(tidyverse)
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[30m[32mv[30m [34mggplot2[30m 3.1.0 [32mv[30m [34mpurrr [30m 0.2.5
[32mv[30m [34mtibble [30m 2.0.1 [32mv[30m [34mdplyr [30m 0.7.8
[32mv[30m [34mtidyr [30m 0.8.2 [32mv[30m [34mstringr[30m 1.3.1
[32mv[30m [34mreadr [30m 1.3.1 [32mv[30m [34mforcats[30m 0.3.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(sf)
Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(spdep)
Loading required package: sp
Loading required package: Matrix
Attaching package: 㤼㸱Matrix㤼㸲
The following object is masked from 㤼㸱package:tidyr㤼㸲:
expand
Loading required package: spData
To access larger datasets in this package, install the spDataLarge package with:
`install.packages('spDataLarge', repos='https://nowosad.github.io/drat/', type='source')`
library(geog4ga3)
Begin by loading the data files you will use in this activity:
data("HamiltonDAs")
data("trips_by_mode")
data("travel_time_car")
HamiltonDAs are the Dissemination Areas for Hamilton CMA, which coincide with the Traffic Analysis Zones (TAZ) of the Transportation Tomorrow Survey of 2011. The dataframe trips_by_mode includes the number of trips by mode of transportation by TAZ (equivalently DA), as well as other useful information from the 2011 census for Hamilton CMA. Finally, the dataframe travel_time_car includes the travel distance/time from TAZ/DA centroids to Jackson Square in downtown Hamilton.
The data for this activity were retrieved from the 2011 Transportation Tomorrow Survey TTS, the periodic travel survey of the Greater Toronto and Hamilton Area, as well as data from the 2011 Canadian Census Census Program.
Before beginning the activity, join the information on trips and travel time to the sf object. Note that to complete the join, the identifier (in this case GTA06) must be in the same format in both data frames:
travel_time_car$GTA06 <- factor(travel_time_car$GTA06)
# Travel time
HamiltonDAs <- left_join(HamiltonDAs, travel_time_car, by = "GTA06")
# Trips by mode
HamiltonDAs <- left_join(HamiltonDAs, trips_by_mode, by = "GTA06")
Column `GTA06` joining factor and character vector, coercing into character vector
The analysis will be based on travel by car in the Hamilton CMA. Calculate the proportion of trips by car by TAZ:
HamiltonDAs <- mutate(HamiltonDAs, Auto_driver.prop = Auto_driver / (Auto_driver + Cycle + Walk))
Note that the proportion of people who travelled by car as passengers are not included in the denominator of the proportion! This is because every trip as a passenger is already included in trips with one driver.
summary(HamiltonDAs)
ID GTA06 VAR1 VAR2 VAR3 VAR4
2299 : 1 Length:297 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
2300 : 1 Class :character 1st Qu.:0.3680 1st Qu.:0.3800 1st Qu.:0.3521 1st Qu.:0.2989
2301 : 1 Mode :character Median :0.5345 Median :0.4937 Median :0.5699 Median :0.5476
2302 : 1 Mean :0.5241 Mean :0.4966 Mean :0.5548 Mean :0.5325
2303 : 1 3rd Qu.:0.6938 3rd Qu.:0.6091 3rd Qu.:0.7378 3rd Qu.:0.7894
2304 : 1 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
(Other):291
VAR5 group from to m km
Min. :0.0000 Min. :4050 Length:297 Length:297 Min. : 493 Min. : 0.493
1st Qu.:0.2998 1st Qu.:5023 Class :character Class :character 1st Qu.: 6614 1st Qu.: 6.614
Median :0.4810 Median :5106 Mode :character Mode :character Median :12509 Median :12.509
Mean :0.5001 Mean :4985 Mean :13522 Mean :13.522
3rd Qu.:0.6915 3rd Qu.:5188 3rd Qu.:19231 3rd Qu.:19.231
Max. :1.0000 Max. :6020 Max. :37558 Max. :37.558
miles seconds minutes hours Cycle Auto_driver
Min. : 0.3064 Min. : 115 Min. : 1.917 Min. :0.03194 Min. : 0.00 Min. : 16
1st Qu.: 4.1099 1st Qu.: 810 1st Qu.:13.500 1st Qu.:0.22500 1st Qu.: 0.00 1st Qu.: 1200
Median : 7.7731 Median :1093 Median :18.217 Median :0.30361 Median : 0.00 Median : 2771
Mean : 8.4020 Mean :1064 Mean :17.733 Mean :0.29556 Mean : 30.15 Mean : 3398
3rd Qu.:11.9501 3rd Qu.:1346 3rd Qu.:22.433 3rd Qu.:0.37389 3rd Qu.: 40.00 3rd Qu.: 4635
Max. :23.3385 Max. :2100 Max. :35.000 Max. :0.58333 Max. :623.00 Max. :17743
Auto_passenger Walk Population Worked_in_2010_Full-time Worked_in_2010_Part-time
Min. : 0.0 Min. : 0 Min. : 38.88 Min. : 0.0 Min. : 0.0
1st Qu.: 249.0 1st Qu.: 0 1st Qu.: 1043.84 1st Qu.: 309.9 1st Qu.: 81.8
Median : 651.0 Median : 91 Median : 1996.86 Median : 765.4 Median : 203.6
Mean : 853.8 Mean : 217 Mean : 2349.66 Mean : 866.8 Mean : 245.4
3rd Qu.:1158.0 3rd Qu.: 332 3rd Qu.: 3165.93 3rd Qu.:1164.9 3rd Qu.: 350.8
Max. :4321.0 Max. :1599 Max. :12770.55 Max. :5925.9 Max. :1661.2
Worked_at_home Pop_Density Median_Age Family_Size_2 Family_Size_3
Min. : 0.00 Min. : 26.21 Min. : 3.845 Min. : 7.25 Min. : 3.237
1st Qu.: 19.13 1st Qu.: 255.79 1st Qu.:37.031 1st Qu.: 145.45 1st Qu.: 57.333
Median : 52.75 Median : 1571.20 Median :41.986 Median : 276.25 Median :117.072
Mean : 66.37 Mean : 2122.26 Mean :41.144 Mean : 313.18 Mean :145.974
3rd Qu.: 88.49 3rd Qu.: 3183.92 3rd Qu.:45.012 3rd Qu.: 432.92 3rd Qu.:209.947
Max. :559.98 Max. :14232.57 Max. :56.850 Max. :1489.03 Max. :859.090
Family_Size_4 Family_Size_5_more Median_income Average_income Employment_rate
Min. : 1.62 Min. : 1.617 Min. : 9.5 Min. : 11.45 Min. :32.75
1st Qu.: 49.04 1st Qu.: 21.564 1st Qu.:28564.2 1st Qu.:36029.63 1st Qu.:53.67
Median : 120.03 Median : 45.069 Median :33155.0 Median :42550.46 Median :60.50
Mean : 141.84 Mean : 59.872 Mean :33463.6 Mean :43232.64 Mean :60.28
3rd Qu.: 184.96 3rd Qu.: 77.197 3rd Qu.:40190.3 3rd Qu.:50126.03 3rd Qu.:66.50
Max. :1281.18 Max. :387.375 Max. :52496.1 Max. :81235.73 Max. :76.70
Unemployment_rate Median_commuting_duration Auto_driver.prop geometry
Min. : 0.001258 Min. :15.41 Min. :0.6681 MULTIPOLYGON :297
1st Qu.: 5.151013 1st Qu.:20.40 1st Qu.:0.9127 epsg:26917 : 0
Median : 6.600000 Median :20.61 Median :0.9643 +proj=utm ...: 0
Mean : 7.553307 Mean :21.94 Mean :0.9450
3rd Qu.: 9.328250 3rd Qu.:21.45 3rd Qu.:1.0000
Max. :23.200001 Max. :30.60 Max. :1.0000
Auto_driver.prop, and use Moran’s I to test for spatial autocorrelation. What do you find? Is autocorrelation in this variable a sign that autocorrelation will be an issue in regression analysis?ggplot(data = HamiltonDAs) +
geom_sf(aes(fill = Auto_driver.prop)) +
scale_fill_distiller(direction = 1)
Create listw
HamiltonDAs.w <- nb2listw(poly2nb(pl = as(HamiltonDAs, "Spatial")))
Moran’s I:
moran.test(HamiltonDAs$Auto_driver.prop, HamiltonDAs.w)
Moran I test under randomisation
data: HamiltonDAs$Auto_driver.prop
weights: HamiltonDAs.w
Moran I statistic standard deviate = 12.944, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.436385034 -0.003378378 0.001154235
localmoran.map(HamiltonDAs, HamiltonDAs.w, "Auto_driver.prop", by = "GTA06")
Loading required package: plotly
Attaching package: 㤼㸱plotly㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
last_plot
The following object is masked from 㤼㸱package:stats㤼㸲:
filter
The following object is masked from 㤼㸱package:graphics㤼㸲:
layout
Joining, by = "key"
Column `key` joining character vector and factor, coercing into character vectorNo trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
Pop_Density and travel time in minutes. Discuss this model.model1 <- lm(formula = Pop_Density ~ minutes, data = HamiltonDAs)
summary (model1)
Call:
lm(formula = Pop_Density ~ minutes, data = HamiltonDAs)
Residuals:
Min 1Q Median 3Q Max
-3454.9 -1294.3 -126.7 946.9 9993.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5441.82 293.04 18.57 <2e-16 ***
minutes -187.19 15.39 -12.17 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1843 on 295 degrees of freedom
Multiple R-squared: 0.3341, Adjusted R-squared: 0.3319
F-statistic: 148 on 1 and 295 DF, p-value: < 2.2e-16
ggplot(data = HamiltonDAs, aes(x = minutes, y = Pop_Density))+
geom_point() +
stat_function(fun = function(x)(5441.82 - 187.19 * x),
geom ="line", color = "blue", size = 1) +
geom_vline(xintercept = 0) + geom_hline(yintercept = 0)
#The values 5441.82 and 187.19 are taken from the summary of model 1 under coefficients. This is one exmaple of a model, other models can also be made depending on the function that you want to fit to the data.
HamiltonDAs <- mutate(HamiltonDAs, Pop_Density.log = log(Pop_Density))
model2 <- lm(formula = Pop_Density.log ~ minutes, data = HamiltonDAs)
summary (model2)
Call:
lm(formula = Pop_Density.log ~ minutes, data = HamiltonDAs)
Residuals:
Min 1Q Median 3Q Max
-3.9782 -0.8387 0.2939 0.9750 3.4271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.96725 0.21875 40.99 <2e-16 ***
minutes -0.12032 0.01148 -10.48 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.376 on 295 degrees of freedom
Multiple R-squared: 0.2712, Adjusted R-squared: 0.2687
F-statistic: 109.8 on 1 and 295 DF, p-value: < 2.2e-16
moran.test(model1$residuals, HamiltonDAs.w)
Moran I test under randomisation
data: model1$residuals
weights: HamiltonDAs.w
Moran I statistic standard deviate = 15.654, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.524580696 -0.003378378 0.001137454
moran.test(model2$residuals, HamiltonDAs.w)
Moran I test under randomisation
data: model2$residuals
weights: HamiltonDAs.w
Moran I statistic standard deviate = 17.785, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.603096925 -0.003378378 0.001162815